185 research outputs found

    Avoiding overfitting of multilayer perceptrons by training derivatives

    Full text link
    Resistance to overfitting is observed for neural networks trained with extended backpropagation algorithm. In addition to target values, its cost function uses derivatives of those up to the 4th4^{\mathrm{th}} order. For common applications of neural networks, high order derivatives are not readily available, so simpler cases are considered: training network to approximate analytical function inside 2D and 5D domains and solving Poisson equation inside a 2D circle. For function approximation, the cost is a sum of squared differences between output and target as well as their derivatives with respect to the input. Differential equations are usually solved by putting a multilayer perceptron in place of unknown function and training its weights, so that equation holds within some margin of error. Commonly used cost is the equation's residual squared. Added terms are squared derivatives of said residual with respect to the independent variables. To investigate overfitting, the cost is minimized for points of regular grids with various spacing, and its root mean is compared with its value on much denser test set. Fully connected perceptrons with six hidden layers and 21042\cdot10^{4}, 11061\cdot10^{6} and 51065\cdot10^{6} weights in total are trained with Rprop until cost changes by less than 10% for last 1000 epochs, or when the 10000th10000^{\mathrm{th}} epoch is reached. Training the network with 51065\cdot10^{6} weights to represent simple 2D function using 10 points with 8 extra derivatives in each produces cost test to train ratio of 1.51.5, whereas for classical backpropagation in comparable conditions this ratio is 21042\cdot10^{4}

    Computational exploration of molecular receptive fields in the olfactory bulb reveals a glomerulus-centric chemical map

    Get PDF
    © The Author(s) 2020. This article is licensed under a Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.Progress in olfactory research is currently hampered by incomplete knowledge about chemical receptive ranges of primary receptors. Moreover, the chemical logic underlying the arrangement of computational units in the olfactory bulb has still not been resolved. We undertook a large-scale approach at characterising molecular receptive ranges (MRRs) of glomeruli in the dorsal olfactory bulb (dOB) innervated by the MOR18-2 olfactory receptor, also known as Olfr78, with human ortholog OR51E2. Guided by an iterative approach that combined biological screening and machine learning, we selected 214 odorants to characterise the response of MOR18-2 and its neighbouring glomeruli. We found that a combination of conventional physico-chemical and vibrational molecular descriptors performed best in predicting glomerular responses using nonlinear Support-Vector Regression. We also discovered several previously unknown odorants activating MOR18-2 glomeruli, and obtained detailed MRRs of MOR18-2 glomeruli and their neighbours. Our results confirm earlier findings that demonstrated tunotopy, that is, glomeruli with similar tuning curves tend to be located in spatial proximity in the dOB. In addition, our results indicate chemotopy, that is, a preference for glomeruli with similar physico-chemical MRR descriptions being located in spatial proximity. Together, these findings suggest the existence of a partial chemical map underlying glomerular arrangement in the dOB. Our methodology that combines machine learning and physiological measurements lights the way towards future high-throughput studies to deorphanise and characterise structure-activity relationships in olfaction.Peer reviewe

    Could the clinical interpretability of subgroups detected using clustering methods be improved by using a novel two-stage approach?

    Get PDF
    Background: Recognition of homogeneous subgroups of patients can usefully improve prediction of their outcomes and the targeting of treatment. There are a number of research approaches that have been used to recognise homogeneity in such subgroups and to test their implications. One approach is to use statistical clustering techniques, such as Cluster Analysis or Latent Class Analysis, to detect latent relationships between patient characteristics. Influential patient characteristics can come from diverse domains of health, such as pain, activity limitation, physical impairment, social role participation, psychological factors, biomarkers and imaging. However, such 'whole person' research may result in data-driven subgroups that are complex, difficult to interpret and challenging to recognise clinically. This paper describes a novel approach to applying statistical clustering techniques that may improve the clinical interpretability of derived subgroups and reduce sample size requirements. Methods: This approach involves clustering in two sequential stages. The first stage involves clustering within health domains and therefore requires creating as many clustering models as there are health domains in the available data. This first stage produces scoring patterns within each domain. The second stage involves clustering using the scoring patterns from each health domain (from the first stage) to identify subgroups across all domains. We illustrate this using chest pain data from the baseline presentation of 580 patients. Results: The new two-stage clustering resulted in two subgroups that approximated the classic textbook descriptions of musculoskeletal chest pain and atypical angina chest pain. The traditional single-stage clustering resulted in five clusters that were also clinically recognisable but displayed less distinct differences. Conclusions: In this paper, a new approach to using clustering techniques to identify clinically useful subgroups of patients is suggested. Research designs, statistical methods and outcome metrics suitable for performing that testing are also described. This approach has potential benefits but requires broad testing, in multiple patient samples, to determine its clinical value. The usefulness of the approach is likely to be context-specific, depending on the characteristics of the available data and the research question being asked of it

    Challenges Predicting Ligand-Receptor Interactions of Promiscuous Proteins: The Nuclear Receptor PXR

    Get PDF
    Transcriptional regulation of some genes involved in xenobiotic detoxification and apoptosis is performed via the human pregnane X receptor (PXR) which in turn is activated by structurally diverse agonists including steroid hormones. Activation of PXR has the potential to initiate adverse effects, altering drug pharmacokinetics or perturbing physiological processes. Reliable computational prediction of PXR agonists would be valuable for pharmaceutical and toxicological research. There has been limited success with structure-based modeling approaches to predict human PXR activators. Slightly better success has been achieved with ligand-based modeling methods including quantitative structure-activity relationship (QSAR) analysis, pharmacophore modeling and machine learning. In this study, we present a comprehensive analysis focused on prediction of 115 steroids for ligand binding activity towards human PXR. Six crystal structures were used as templates for docking and ligand-based modeling approaches (two-, three-, four- and five-dimensional analyses). The best success at external prediction was achieved with 5D-QSAR. Bayesian models with FCFP_6 descriptors were validated after leaving a large percentage of the dataset out and using an external test set. Docking of ligands to the PXR structure co-crystallized with hyperforin had the best statistics for this method. Sulfated steroids (which are activators) were consistently predicted as non-activators while, poorly predicted steroids were docked in a reverse mode compared to 5α-androstan-3β-ol. Modeling of human PXR represents a complex challenge by virtue of the large, flexible ligand-binding cavity. This study emphasizes this aspect, illustrating modest success using the largest quantitative data set to date and multiple modeling approaches

    A new framework for sign language alphabet hand posture recognition using geometrical features through artificial neural network (part 1)

    Get PDF
    Hand pose tracking is essential in sign languages. An automatic recognition of performed hand signs facilitates a number of applications, especially for people with speech impairment to communication with normal people. This framework which is called ASLNN proposes a new hand posture recognition technique for the American sign language alphabet based on the neural network which works on the geometrical feature extraction of hands. A user’s hand is captured by a three-dimensional depth-based sensor camera; consequently, the hand is segmented according to the depth analysis features. The proposed system is called depth-based geometrical sign language recognition as named DGSLR. The DGSLR adopted in easier hand segmentation approach, which is further used in segmentation applications. The proposed geometrical feature extraction framework improves the accuracy of recognition due to unchangeable features against hand orientation compared to discrete cosine transform and moment invariant. The findings of the iterations demonstrate the combination of the extracted features resulted to improved accuracy rates. Then, an artificial neural network is used to drive desired outcomes. ASLNN is proficient to hand posture recognition and provides accuracy up to 96.78% which will be discussed on the additional paper of this authors in this journal

    Hydrophobicity and Charge Shape Cellular Metabolite Concentrations

    Get PDF
    What governs the concentrations of metabolites within living cells? Beyond specific metabolic and enzymatic considerations, are there global trends that affect their values? We hypothesize that the physico-chemical properties of metabolites considerably affect their in-vivo concentrations. The recently achieved experimental capability to measure the concentrations of many metabolites simultaneously has made the testing of this hypothesis possible. Here, we analyze such recently available data sets of metabolite concentrations within E. coli, S. cerevisiae, B. subtilis and human. Overall, these data sets encompass more than twenty conditions, each containing dozens (28-108) of simultaneously measured metabolites. We test for correlations with various physico-chemical properties and find that the number of charged atoms, non-polar surface area, lipophilicity and solubility consistently correlate with concentration. In most data sets, a change in one of these properties elicits a ∼100 fold increase in metabolite concentrations. We find that the non-polar surface area and number of charged atoms account for almost half of the variation in concentrations in the most reliable and comprehensive data set. Analyzing specific groups of metabolites, such as amino-acids or phosphorylated nucleotides, reveals even a higher dependence of concentration on hydrophobicity. We suggest that these findings can be explained by evolutionary constraints imposed on metabolite concentrations and discuss possible selective pressures that can account for them. These include the reduction of solute leakage through the lipid membrane, avoidance of deleterious aggregates and reduction of non-specific hydrophobic binding. By highlighting the global constraints imposed on metabolic pathways, future research could shed light onto aspects of biochemical evolution and the chemical constraints that bound metabolic engineering efforts

    CLUSS: Clustering of protein sequences based on a new similarity measure

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The rapid burgeoning of available protein data makes the use of clustering within families of proteins increasingly important. The challenge is to identify subfamilies of evolutionarily related sequences. This identification reveals phylogenetic relationships, which provide prior knowledge to help researchers understand biological phenomena. A good evolutionary model is essential to achieve a clustering that reflects the biological reality, and an accurate estimate of protein sequence similarity is crucial to the building of such a model. Most existing algorithms estimate this similarity using techniques that are not necessarily biologically plausible, especially for hard-to-align sequences such as proteins with different domain structures, which cause many difficulties for the alignment-dependent algorithms. In this paper, we propose a novel similarity measure based on matching amino acid subsequences. This measure, named SMS for Substitution Matching Similarity, is especially designed for application to non-aligned protein sequences. It allows us to develop a new alignment-free algorithm, named CLUSS, for clustering protein families. To the best of our knowledge, this is the first alignment-free algorithm for clustering protein sequences. Unlike other clustering algorithms, CLUSS is effective on both alignable and non-alignable protein families. In the rest of the paper, we use the term "<it>phylogenetic</it>" in the sense of "<it>relatedness of biological functions</it>".</p> <p>Results</p> <p>To show the effectiveness of CLUSS, we performed an extensive clustering on COG database. To demonstrate its ability to deal with hard-to-align sequences, we tested it on the GH2 family. In addition, we carried out experimental comparisons of CLUSS with a variety of mainstream algorithms. These comparisons were made on hard-to-align and easy-to-align protein sequences. The results of these experiments show the superiority of CLUSS in yielding clusters of proteins with similar functional activity.</p> <p>Conclusion</p> <p>We have developed an effective method and tool for clustering protein sequences to meet the needs of biologists in terms of phylogenetic analysis and prediction of biological functions. Compared to existing clustering methods, CLUSS more accurately highlights the functional characteristics of the clustered families. It provides biologists with a new and plausible instrument for the analysis of protein sequences, especially those that cause problems for the alignment-dependent algorithms.</p

    Novel Functional MAR Elements of Double Minute Chromosomes in Human Ovarian Cells Capable of Enhancing Gene Expression

    Get PDF
    Double minute chromosomes or double minutes (DMs) are cytogenetic hallmarks of extrachromosomal genomic amplification and play a critical role in tumorigenesis. Amplified copies of oncogenes in DMs have been associated with increased growth and survival of cancer cells but DNA sequences in DMs which are mostly non-coding remain to be characterized. Following sequencing and bioinformatics analyses, we have found 5 novel matrix attachment regions (MARs) in a 682 kb DM in the human ovarian cancer cell line, UACC-1598. By electrophoretic mobility shift assay (EMSA), we determined that all 5 MARs interact with the nuclear matrix in vitro. Furthermore, qPCR analysis revealed that these MARs associate with the nuclear matrix in vivo, indicating that they are functional. Transfection of MARs constructs into human embryonic kidney 293T cells showed significant enhancement of gene expression as measured by luciferase assay, suggesting that the identified MARS, particularly MARs 1 to 4, regulate their target genes in vivo and are potentially involved in DM-mediated oncogene activation
    corecore